The Role of Cognates in Word Acquisition
PhD Defence / Departament de Medicina i Ciències de la Vida
2024-11-03
Average 20-year-old knows ~42,000 lemmas: mental lexicon
First lexical representations at 6-9 months
Vocabulary size norms for 51,800 monolingual children learning 35 distinct languages (Wordbank, Frank et al. 2017)
Hoff et al. (2012): bilinguals acquire words at similar rates as monolinguals
Floccia et al. (2018): CDI response of 372 bilinguals (UK) learning English + additional language
English-Dutch (22.14%) > English-Mandarin (1.97%)
Higher lexical similarity, larger vocabulary size
Stronger effect in the additional language (e.g., Dutch, Mandarin)
Pairwise lexical similarity (average Levensthein similarity across translations in Floccia et al.)
Cognates: phonologically-similar translation equivalents
| Cognate | Non-cognate |
|---|---|
| [cat] /ˈgat-ˈga.to/ | [dog] /ˈgos-ˈpe.ro/ |
Some evidence that cognates acquired earlier than non-cognates (Mitchell, Tsui, and Byers-Heinlein 2023; Bosch and Ramon-Casas 2014)
What mechanisms support a cognate facilitation during word acquisition?
Activation spreads across non-selected representations in both languages, through phonological and conceptual links in adults (e.g., Costa, Caramazza, and Sebastian-Galles 2000) and infants (e.g., Von Holzen and Mani 2012; Singh 2014)
Study 1
Submitted, under review
Study 2
In preparation
Cognate beginnings to lexical acquisition: the AMBLA model
\[ \begin{aligned} \definecolor{myred}{RGB}{ 168, 0, 53 } \definecolor{myblue}{RGB}{ 0, 64, 168 } \definecolor{mygreen}{RGB}{0, 168, 87} \definecolor{grey}{RGB}{128, 128, 128} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ {\color{mygreen}\text{Age of Acquisition}_{ij}} &= \{\text{Age}_i \mid {\color{myred}\text{Learning instances}_{ij}} = {\color{myblue}\text{Threshold}} \}\\ {\color{myred}\text{Learning instances}_{ij}} &= \text{Age}_i \cdot \text{Freq}_j \\ \end{aligned} \]
Parameters:
Catalan monolingual child
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]
Parameters:
Catalan monolingual child
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]
Parameters:
Catalan monolingual child
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]
Exposure: proportion of time exposed to the language of \(j\) word
Accumulation of learning instances, a function of Exposure and Frequency.
\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot {\color{myred}\text{Exposure}_{ij}}\\ \end{aligned} \]
Parameters:
Catalan monolingual child
Catalan/Spanish bilingual child
/’gos/ (Catalan), 60%
/’pe.ro/ (Spanish), 40%
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]
Catalan monolingual child
Catalan/Spanish bilingual child
/’gos/ (Catalan), 60%
/’pe.ro/ (Spanish), 40%
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \end{aligned} \]
Degree proportional to their phonological similarity (Cognateness)
\[ \begin{aligned} \textbf{For participant } &i \textbf{ and word-form } j \text{ (translation of } j'): \\ \text{Age of Acquisition}_{ij} &= \{\text{Age}_i \mid \text{Learning instances}_{ij} = \text{Threshold} \}\\ \text{Learning instances}_{ij} &= \text{Age}_i \cdot \text{Freq}_j \cdot \text{Exposure}_{ij} + \\ &({\color{myred}\text{Learning instances}_{ij'} \cdot {\text{Cognateness}}_{j}})\\ \textbf{where:} \\ {\color{myred}\text{Cognateness}_{j,j'}}&{\color{myred} = \text{Levenshtein}(j, j')} \end{aligned} \]
Catalan monolingual child
Catalan/Spanish bilingual child
/’gos/ (Catalan), 60%
/’pe.ro/ (Spanish), 40%
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]
Catalan monolingual child:
Catalan/Spanish bilingual child:
/’gos/ (Catalan), 60%
/’pe.ro/ (Spanish), 40%
\[ \begin{aligned} \text{Threshold} = 300 \\ \text{Freq}_j \sim \text{Poisson}(\lambda = 50) \\ \text{Cognateness}_{j,j'} = 0.75 \end{aligned} \]
Ordinal, multilevel (Bayesian) regression model
Response variable:
p(Comprehension \(<\) Production)
Predictors:
*\(\text{Levenshtein}(j, j')\)
Earlier acquisition for cognates vs. non-cognates
Cognate facilitation moderated by exposure
Only words from the lower exposure benefit from cognateness
Cognateness as a candidate mechanism underlying Floccia et al. (2018)’s results
Cross-language facilitation via co-activation of phonologically similar translation equivalents
Is language-non selectivity already present in the initial lexicon?
Developmental trajectories of bilingual spoken word recognition
Some evidence in infants and children (Von Holzen and Mani 2012; Singh 2014).
Methodological pitfalls: “bilingual” task.
Extending the task to test cross-language priming in bilinguals.
Change in order of trial timecourse:
Auditory label before target-distractor images
Length of Catalan and Spanish words
Temporal proximity of prime and target labels

Exp. 1: monolinguals
Replicate within-language phonological interference from Mani and Plunkett (proof of concept)
Exp. 2 (monolinguals and bilinguals)
If language non-selectivity, stronger interference in cognate vs. non-cognate trials
N = 112 children (15 longitudinal)
26.36 months \(\pm\) 4.01, 20.03–32.5
English monolinguals, Oxford (United Kindgom) (Mani and Plunkett 2010)
Bayesian GAMMs
Proportion of target looking (PTLT)
N = 162 children (81 longitudinal)
25.36 months \(\pm\) 4.01, 20.03–32.5
Bayesian GAMMs
Proportion of target looking (PTLT)
Successful word recognition across:
No evidence of priming effects, within or across languages
Most likely due to design issues
Cognateness facilitates word acquisition in the lower-exposure language
Candidate mechanism behind bilingual vocabulary growth
AMBLA: Cross-language accumulation of learning instances
Language non-selectivity in the initial lexicon: pending severe testing
Explanation for Floccia et. (2018)
Asymmetry in adult models of lexical processing
AMBLA: natural extension of the Standard Model of language acquisition? (Kachergis, Marchman, and Frank 2022)
Design caviats
Generalisability? Language pairs with fewer cognates
Does cognateness impact the acquisition of other grammatical categories (e.g., verbs, adjectives)
Word acquisition vs. word learning
Thanks!
Classification of participants into monolinguals an bilinguals
Cognate contents in the aggregated vocabulary
Aggregated vocabularies might conceal facilitation effects
MCMC convergence for the model in Study 1
Study 2 participant receptive vocabulary sizes across ages and language profiles
MCMC convergence for model in Study 1 (Exp. 1)
MCMC convergence for model in Study 2 (Exp. 1)